Models for Socio-Environmental Data
Dynamic Models: Forecasting The Effects of Harvest on Lynx
June 06, 2019

Motivation

The Eurasian lynx (Lynx lynx) is a medium-sized predator with broad distribution in the boreal forests of Europe and Siberia. The lynx is classified as a threatened species throughout much of its range. There is controversy about legal harvest of lynx in Sweden. Proponents of harvest argue that allowing hunting of lynx reduces illegal kill (poaching). Moreover, Sweden is committed to regulate lynx numbers to prevent excessive predation on reindeer because reindeer are critical to the livelihoods of indigenous pastoralists, the Sami. Many environmentalists oppose harvest, however, arguing that lynx are too rare to remove their fully protected status. A similar controversy surrounds management of wolves in the Western United States.



A forecasting model for the abundance of lynx will help managers make decisions. You have data about the number of lynx family groups censused in the unit as well as annual records of lynx harvested from the unit. You will model the population using the deterministic model:

\[N_t=\lambda(N_{t-1}-H_{t-1}).\]

where \(N_{t}\) is the true, unobserved abundance of lynx and \(H_{t-1}\) is the number of lynx harvested during \(t-1\) to \(t\).

The parentheses in this expression reflect the fact that harvest occurs immediately after census, such that the next years population increment comes from the post-harvest population size.

What would be the model if harvest occurred immediately before census? Three months after census? Continuously throughout the year?


Assume the harvest is observed without error (\(N_t\)), notably our Swedish colleagues are amazing snow trackers and do a good job of estimating the number of family groups (if not the number of lynx) in a management region.

The challenge in this problem is that the observations of lynx abundance (family groups) is not the same as the observation of harvest (number of lynx). Fortunately, you have prior information on the proportional relationship between number of family groups and number of lynx in the population, i.e, \[\phi=\frac{f}{N}\], where \(f\) is the number of family groups and \(N\) is the population size, mean \(\phi=.163\) with standard deviation of the mean = .012.


R libraries needed for this lab

You need to load the following libraries. Set the seed to 10 to compare your answers to ours.

library(SESYNCBayes)
library(rjags)
library(MCMCvis)
library(HDInterval)
set.seed(10)
Diagram the Bayesian network

Develop a hierarchical Bayesian model (also called a state space model) of the lynx population in the management unit. Diagram the Bayesian network (the DAG) of knowns and unknowns and write out the posterior and factored joint distribution.

Fitting the Model
  1. Now you’ll estimate the marginal posterior distribution of the unobserved, true state over time (\(\mathbf{N}\)), the parameters in the model \(\lambda\) and \(\phi\) as well as the process variance and observation variance. You’ll also summarize the marginal posterior distributions of the parameters and unobserved states.

The data for this problem is located in the SESYNCBayes package and can be called using data(LynxFamilies). We’ve provided you with a few useful moment matching functions and have walked your through the process of selecting initial values.

data("LynxFamilies")
y <- LynxFamilies
# Levels of  Harvest to evaluate relative to goals for forecasting part.
h=c(0, 10, 25, 50, 75)
#Function to get beta shape parameters from moments
shape_from_stats <- function(mu = mu.global, sigma = sigma.global){
         a <-(mu^2-mu^3-mu*sigma^2)/sigma^2
         b <- (mu-2*mu^2+mu^3-sigma^2+mu*sigma^2)/sigma^2
        shape_ps <- c(a,b)
        return(shape_ps)
}

#get parameters for distribution of population multiplier, 1/p
shapes=shape_from_stats(.163,.012)
#check prior on p using simulated data from beta distribution
x = seq(0,1,.001)
p=dbeta(x,shapes[1],shapes[2])
plot(x,p,typ="l",xlim=c(.1,.3))

Before you begin it’s very helpful to use simulated data to the verify initial values and model. We simulate the true state by choosing some biologically reasonable values for model parameters and “eyeballing” the fit of the true state to the data. You can then use these simulated values for initial conditions (see the inits list below). This is of particular importantce because failing to give reasonable initial conditions for dynamic models can cause problems in model fitting. Remember, supply initial conditions for all unobserved quantities in the posterior distribution (even those that do not have priors).

set.seed(10)
 
endyr <- nrow(y)
n <- numeric(endyr+1)
mu <- numeric(endyr+1)
fg <- numeric(endyr+1)
phi <- 0.16
lambda <- 1.07
sigma.p <- 0.2
 
n[1] <- y$census[1]/phi # n in the unit of individuals
mu[1] <- n[1] # mean from deterministic model to simulate
fg[1] <- n[1]*phi # Nt in the unit of
 
for(t in 2:(endyr+1)){
  mu[t] <- lambda*(n[t-1] - y$harvest[t-1])
  n[t] <- rlnorm(1,log(mu[t]),sigma.p)
  fg[t] <- n[t]*phi
}
plot(y$year,y$census,ylim=c(0,100),
     xlab="Year",ylab="Family group",
     main = "Simulated data")
lines(y$year,fg[1:length(y$year)])

##visually match simulated data with observations for initial conditions
endyr = nrow(y)
n=numeric(endyr+1)
mu=numeric(endyr+1) #use this for family groups
lambda=1.1
sigma.p=.00001
n[1] = y$census[1]

for(t in 2: (endyr+1)){
    n[t] <- lambda*(y$census[t-1] - .16 * y$harvest[t-1])  #use this for family groups
    }
plot(y$year, y$census,ylim=c(0,100),xlab="Year", ylab="Population size", main="Simulated data")
lines(y$year,n[1:length(y$year)])

Here’s your starting code:

#Data for JAGS
data = list(
    y.endyr = endyr,
    y.a=shapes[1], 
    y.b=shapes[2],
    y.H=y$harvest,
    y=y$census,
    h=h
)

inits = list(
    list(
    lambda = 1.2,
    sigma.p = .01,
    N=n
    ),
    list(
    lambda = 1.01,
    sigma.p = .2,
    N=n*1.2),
    list(
    lambda = .95,
    sigma.p = .5,
    N=n*.5)
    )
  1. Write the JAGS model to estimate the marginal posterior distribution of the unobserved, true state over time (\(\mathbf{N}\)), the parameters in the model \(\lambda\) and \(\phi\) as well as the process variance and observation variance. Include a summary the marginal posterior distributions of the parameters and unobserved states.
  1. Check MCMC chains for model parameters, process variance, and latent states for convergence. This will probably require using the excl option in MCMCsummary.
  1. Conduct posterior predictive checks by simulating a new dataset for family groups (\(f_t\)) at every MCMC iteration. Calculate a Bayesian p value using the sums of squared discrepancy between the observed and the predicted number of family groups based on observed and simulated data, \[T^{observed} = \sum_{t=1}^{n}(f_{t}^{observed}-N_{t}\phi)^{2} \\T^{model} = \sum_{t=1}^{n}(f_{t}^{simulated}-N_{t}\phi)^{2}.\] The Bayesian p value is the proportion of MCMC iterations for which \(T^{model}>T^{obs}\).
  1. Assure yourself that the process model adequately accounts for temporal autocorrelation in the residuals—allowing the assumption that they are independent and identically distributed.

To do this, include a derived quantity \[e_t=y_t-N_t\phi\] in your JAGS code and JAGS object. Use the following code or something like it to examine how autocorrelation in the residuals changes with time lag. Write a paragraph describing how to interpret the plot produced by this function.

acf(unlist(MCMCpstr(z,param="e",func=mean)),main="", lwd = 3, ci=0)
  1. Plot the median of the marginal posterior distribution of the number of lynx family groups over time (1998-2016) including a highest posterior density interval. Include your forecast for 2017 (the predictive process distribution) in this plot.
  1. ADVANCED Due to licensing constraints related to the time that it takes to properly issue hunting permits/licenses, Lynx harvest decisions are made before the population is censused, even though harvest actually occurs shortly after the census.

Make a forecast of the number of family groups in 2018 assuming five alternative levels for 2017 harvest (0, 10, 25, 50, and 75 animals). Environmentalists and hunters have agreed on a acceptable range for lynx abundance in the unit, 26 - 32 family groups. Compute the probability that the post-harvest number of family groups will be below, within, and above this range during 2018. Tabulate these values. Hint: Set up a “model experiment” in your JAGS code where you forecast the number of lynx family groups during 2018 under the specified levels of harvest. Extract the MCMC chains for the forecasted family groups( e.g., fg.hat) using MCMCchains(coda_object, params = fg.hat) Use the ecdf function on the R side to compute the probabilities that the forecasted number groups will be below, within, or above the acceptable range.

A note about the data. Each row in the data file gives the observed number of family groups for that year in column 2 and that year’s harvest in column 3. The harvest in each row influences the population size in the next row. So, for example, the 2016 harvest influences the 2017 population size.

Use a lognormal distribution to model the true lynx population size over time. Use a Poisson distribution for the data model relating the true, unobserved state (the total population size) to the observed data (number of family groups).

An alternative approach, which is slightly more difficult to code, is to model the process as \(\text{negative binomia}(N_t|\lambda(N_{t-1}-H_{t-1}, \rho))\) and model the data as \(\text{binomial}(y_t| \text{round}(N_t\phi),p)\) where \(p\) is a detection probability. Explain why this second formulation might be better than the formulation you are using. (It turns out they give virtually identical results.)